On Benchmarking the Matrix Multiplication Algorithm using OpenMP, MPI and CUDA Programming Languages
نویسندگان
چکیده
Parallel programming languages represent a common theme in the evolution of high performance computing (HPC) systems. There are several parallel programming languages that are directly associated with different HPC systems. In this paper, we compare the performance of three commonly used parallel programming languages, namely: OpenMP, MPI and CUDA. Our performance evaluation of these languages is based on the implementation of matrix multiplication algorithms. Matrix multiplication is chosen because of its wide application in many scientific and engineering problems such as bioinformatics, linear algebra, and computer graphics. Our results show that CUDA programming delivers up to 15 fold speed acceleration relative to OpenMP and MPI Programming. However, CUDA programming may prove comparatively more challenging to
منابع مشابه
A Comparative Study on Performance Benefits of Multi-core CPUs using OpenMP
Achieving scalable parallelism from general programs was not successful to this point. To extract parallelism from programs has become the key focus of interest on multi-core CPUs. There are many techniques and programming models such as MPI, CUDA and OpenMP adopted in order to exploit more performance. But there is an urge to find the best parallel programming techniques for the benefit of per...
متن کاملParallel computing using MPI and OpenMP on self-configured platform, UMZHPC.
Parallel computing is a topic of interest for a broad scientific community since it facilitates many time-consuming algorithms in different application domains.In this paper, we introduce a novel platform for parallel computing by using MPI and OpenMP programming languages based on set of networked PCs. UMZHPC is a free Linux-based parallel computing infrastructure that has been developed to cr...
متن کاملOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
CUDA programmed GPUs are rapidly becoming a major choice in high performance computing and there are a growing number of applications which are being ported to the CUDA platform. However much less research has been carried out to evaluate the performance when CUDA is integrated with other parallel programming paradigms. We have developed a general purpose matrix multiplication algorithm and a C...
متن کاملWorkshare Process of Thread Programming and MPI Model on Multicore Architecture
Comparison between OpenMP for thread programming model and MPI for message passing programming model will be conducted on multicore shared memory machine architectures in order to find which has a better performance in terms of speed and throughput. Application used to assess the scalability of the evaluated parallel programming solutions is matrix multiplication with customizable matrix dimens...
متن کاملAccelerating high-order WENO schemes using two heterogeneous GPUs
A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...
متن کامل